% !TeX root = main.tex

\section{Materials and Methods}
\label{M_and_M}

\subsection{Definition of the volume expansion factor}

In this work, we focused on a volume expansion factor ($VEF$) - dimensionless - defined as the total wood volume of a tree ($Vtot$), i.e., the volume of stem and all branches whichever their diameter, divided by the volume of the part of the stem having a diameter higher than 7 cm ($Vstem7$) (equation \eqref{equa3}). Hence, with this definition $VEF > 1$. The stem top diameter of 7 cm corresponds to the limit used by the French National Forest Inventory and by five other European countries \citep{Gschwantner2009}.  
\begin{equation}
   Vtot = VEF \cdot Vstem7
   \label{equa3}
\end{equation}

\subsection{Biological material}

Two distinct datasets were used in this study:

Dataset \#1 was issued from manuscript sheets of volume measurements collected between 1920 and 1985 in the network of permanent plots settled by the French forest administration. 
From this material, a first set of 4619 tree sheets was encoded and used by \cite{Vallet2006} for delivering models of total aboveground volumes for seven species.
A gigantic task of data encoding and verification was performed by CRDPI (Republic of Congo) within the frame of the French ANR EMERGE project which aims to evaluate the available forest biomass in France \citep{Deleuze2010, Rivoire2010}. The current database includes 44668 trees on which tree circumference, height and $Vstem7$ were measured. The total wood volume $Vtot$ was measured on about one fourth of these trees. These measurements were performed according to the protocol of \cite{Oudin1930} (summarised in \cite{Vallet2006}). The volumes of stem and branches above 7 cm-diameter were issued from circumference measurements taken every one meter. 
The volume of small branches  (below 7cm diameter) was computed from weight measurements, assuming a green density of 1000 kg.m$^{-3}$. After removing trees with uncertain species identification, incomplete or obviously incorrect data, and trees with diameter at breast height (DBH) below 7 cm for which $Vstem7$ does not make many sense, the final dataset \#1 included 8192 trees from 10 genera and 19 species (Table \ref{tableEssences1}).

\begin{center}
*****Table \ref{tableEssences1} about here*****
\end{center}

Dataset \#2 was used for model validation. It was constituted during the above mentioned EMERGE project with the main objective to complement the available data with under-represented species, regions or $C130$ values. A total of 209 trees\footnote{45 stems of coppiced \textit{Eucalyptus} trees were not considered here since the total volume of the clump was not measured.} was measured in 2009 and 2010. After removing missing data and trees with DBH below 7 cm, the final dataset \#2 included 176 trees from 11 genera and 13 species (Table \ref{tableEssences2}). Thirty-one \textit{Fagus sylvatica} trees and 23 \textit{Acer pseudoplatanus} trees were sampled within a high forest located along a soil gradient: soil \#1 was alocrisol, soil \#2 was oligo-satured brunisol, soil \#3 was rendisol-calcisol and soil \#4 was rendosol. Among 12 \textit{Quercus petraea/robur} trees coming from a high forest, six were chosen for having a big fork. 

\begin{center}
*****Table \ref{tableEssences2} about here*****
\end{center}

Only five species are common to both datasets. Compared with dataset \#1, dataset \#2 included less trees but more detailed measurements on each tree (such as wood density and nutrient content by compartments, T-LiDAR measurements, maturation strains, etc.). Moreover, basic description of the silvicultural system (high-forest, coppice, coppice-with-standards...) were recorded in dataset \#2 whereas no stand information was available in dataset \#1. Since the heights were not measured on standing trees in dataset \#1, the length above stump measured on the felled tree was used as surrogate for the total height $H$ for both datasets.

In this study, dataset \#1 was used for calibration and cross-validation of the models. Dataset \#2 was used for independent validation of the models including eight species not used for calibration.

\subsection{Modeling the volume expansion factor}
In this section we describe step by step the modeling approach that we developed on the basis of recommendations made by \cite{Zuur2009} and \cite{Pinheiro2000}. The goodness of fit of the different models was measured by the Akaike information criterion (AIC) and by computing RMSE and relative RMSE \citep{Mayer1993} in terms of $VEF$ and $Vtot$ predictions. A 10-fold cross-validation approach based on dataset \#1 was tested as well as the use of the model to predict $VEF$ on a completely independent dataset (dataset \#2). The same methodology can be easily used for other definitions of expansion factors. 

On the basis of the data visualization (Fig. \ref{VEF_vs_C130}), we chose equation \eqref{equa4} to model the $VEF$ as a function of $C130$ (in cm). With this equation, the estimated $VEF$ was forced to be $>1$ to ensure that $Vtot$ $>$ $Vstem7$. The first part of the equation, $\exp(\beta_{1}-C130)^{\beta_{2}}$, was used to model the strong decrease of $VEF$ that was observed for small diameter trees and the second part, $\exp(\beta_{3}) \cdot C130 + 1$, was used to model the slight linear increase of $VEF$ that was observed for bigger trees (the slope $\exp(\beta_{3})$ was $> 0$). Looking at the model residuals (not shown) and confirmed by the AIC, we observed that a refinement could be done by including the tree height $H$ (in m) in equation \eqref{equa4} and then equation \eqref{equa5} was obtained. Hence, the model was better fitted to the data but one drawback was that the meaning of the slope parameter changed.  

\begin{center}
*****Figure \ref{VEF_vs_C130} about here*****
\end{center}
\begin{equation}
   VEF = \exp(\beta_{1}-C130)^{\beta_{2}} + \exp(\beta_{3}) \cdot C130 + 1
   \label{equa4}
\end{equation}
\begin{equation}
   VEF = \exp(\beta_{1}-C130)^{\beta_{2}} + \exp(\beta_{3}) \cdot \frac{C130}{H^{2}} + 1
   \label{equa5}
\end{equation}

The \textit{nlsList} function from the \textit{nlme} package of the R statistical software \citep{R2011} was used to fit separate models depending on the genus. The results help to decide for which parameters random effects are needed since fitting separate models requires too many parameters. We hypothesized that the true inter-genus variability existing in the French forest resource was well represented in our calibration dataset (at least for French temperate species). Modeling the inter-genus variability as a random effect was also useful for providing reliable estimations of species-specific parameters through BLUP \citep{Robinson1991}, especially when the number of trees was too low for a given genus or the complete range of $C130$ not represented, for instance. The mixed model fitting (Equation \eqref{equa7}) was done by using the \textit{nlme} function of the R software. The model parameters were estimated by the maximum likelihood method. From the results of the \textit{nlsList} fitting (Fig. \ref{IC_parameters_nlsList}), it appeared that parameter $\beta_{3}$ was highly variable depending on the genus and more precisely depending on the belonging to the angiosperm or gymnosperm groups. Parameters $\beta_{1}$ and $\beta_{2}$ were much less variable. Moreover, in the case of parameter $\beta_{1}$ the variability was probably due to a lack a small diameter trees for some species. Finally, we chose to model random effects for parameters $\beta_{2}$ and $\beta_{3}$ and to keep only a fixed effect for parameter $\beta_{1}$. More precisely, the parameter $\beta_{3}$ was modeled as $\beta_{3}' \cdot G + \beta_{4} + b_{4i}$, where $G$ is a binary variable coding for the belonging to the angiosperm ($G = 1$) or gymnosperm ($G = 0$) groups, $\beta_{3}'$ and $\beta_{4}$ are fixed parameters and $b_{4i}$ is used for modeling a random effect on the intercept. The correlation between random effects $b_{2i}$ and $b_{4i}$ was not statistically significant (correlation of 0.29 with a 95\% confidence interval of [-0.36; 0.75]) and then we defined the corresponding variance-covariance matrix $\psi$ as being diagonal. Last, looking at the model residuals as a function of fitted values we observed that the within-group variance was increasing. High $VEF$ fitted values corresponded to small $C130$ values. Therefore, in order to take into account this heterogeneity we chose to model the within-group variance as a power function of the $C130$, with different values of the power for angiosperms and gymnosperms. The final model is given by Equation \eqref{equa7}. The AIC of the model was -11832.  
\begin{equation}
   VEF_{ij} = \exp(\beta_{1}-C130_{ij})^{\beta_{2}+b_{2i}} + \exp(\beta_{3}' \cdot G_{i} + \beta_{4} + b_{4i}) \cdot \frac{C130_{ij}}{H_{ij}^{2}} + 1 + \epsilon_{ij}
   \label{equa7}
\end{equation}
where $i$ represents the genus and $j$ represents the tree within the genus. $\beta_{1}$, $\beta_{2}$, $\beta_{3}'$ and $\beta_{4}$ are the fixed parameters of the model, and $b_{2i}$ and $b_{4i}$ are parameters for the random effects. $b_{2i}$ and $b_{4i}$ are assumed to be normally distributed with mean 0 and a diagonal variance-covariance matrix $\psi$. The within-group erros $\epsilon_{ij}$ are assumed to be normally distributed with mean 0 and variance equal to $\sigma^{2} \cdot |C130_{ij}|^{2\delta_{G_{i}}}$.

For practical reasons, since $C130$ is a variable easier to measure in the field than tree height $H$, we also proposed a model based on $C130$ only, which had exactly the same characteristics as the model presented in equation \eqref{equa7}, except that the variable $H$ was removed from the equation. The AIC of the model was -9627.

\begin{center}
*****Figure \ref{IC_parameters_nlsList} about here*****
\end{center}   


